智能论文笔记

A Taxonomy of Anomalies in Log Data

Thorsten Wittkopp , Philipp Wiesner , Dominik Scheinert , Odej Kao

分类：机器学习

2021-11-26

日志数据异常检测是IT操作的人工智能区域中的核心组件。但是，大量现有方法使其难以为特定系统选择正确的方法。更好地了解不同种类的异常，以及哪些算法适合检测它们，将支持研究人员和IT运营商。虽然已经存在的异常分类常见的分类，但尚未专门应用于记录数据，指出该域中的特征和特点。在本文中，我们为不同种类的日志数据异常提供了一种分类，并介绍了一种分析标记数据集中的这种异常的方法。我们将我们的分类系统应用于三个常见的基准数据集Thunderbird，Spirit和BGL，并培训了五种最先进的无监督异常检测算法，以评估它们在检测不同种类的异常中的性能。我们的结果表明，最常见的异常类型也是最容易预测的。此外，基于深度学习的方法在所有异常类型中占据了基于数据的方法，但特别是当涉及到检测语境异常时。

translated by 谷歌翻译

On the Potential of Execution Traces for Batch Processing Workload Optimization in Public Clouds

Dominik Scheinert , Alireza Alamgiralem , Jonathan Bader , Jonathan Will , Thorsten Wittkopp , Lauritz Thamsen

分类：机器学习

2021-11-16

随着越来越多的数据，数据处理工作负载和其资源使用的管理变得越来越重要。由于管理专用基础架构是在许多情况下不可行或不经济的情况下，用户逐步执行其各自的工作负载在云中。由于工作负载和资源的配置通常是具有挑战性的，已经提出了各种方法，以便快速朝着良好的配置简化或基于先前运行的数据确定一个。仍然，培训此类方法的性能数据通常缺乏，并且必须昂贵地收集。在本文中，我们提出了一种协作方法，用于在用户之间共享匿名工作负载执行迹线，为常规模式进行挖掘，并利用历史工作负载的集群以供将来的优化。我们在公开可用的跟踪数据集上评估我们的原型实现，以便在公开的跟踪数据集上挖掘工作负载执行图，并演示通过迹线确定的工作负载群集的预测值。

translated by 谷歌翻译

LogLAB: Attention-Based Labeling of Log Data Anomalies via Weak Supervision

Thorsten Wittkopp , Philipp Wiesner , Dominik Scheinert , Alexander Acker

分类：机器学习

2021-11-02

随着云操作的规模和复杂性，监测数据等监视数据中的异常检测将是管理未来IT基础架构的重要组成部分。然而，基于人工智能的许多方法，例如监督的深度学习模型，需要大量标记的训练数据来表现良好。在实践中，这种数据很少可用，因为标签日志数据昂贵，耗时，并且需要深入了解底层系统。我们呈现Loglab，一种用于自动标记日志消息的新型建模方法，而无需专家手动工作。我们的方法依赖于监视系统提供的估计失败时间窗口，以重新检测到产生精确标记的数据集。它基于注意机制，并使用定制目标函数，以便对不平衡数据进行削弱的监督深度学习技术。我们的评估表明，Loglab始终如一地优于三个不同的数据集中的九个基准方法，即使在大故障时间窗口即使在大故障时间窗口也会保持超过0.98的F1分数。

translated by 谷歌翻译

Learned Systems Security

Roei Schuster , Jin Peng Zhou , Thorsten Eisenhofer , Paul Grubbs , Nicolas Papernot

分类：机器学习

2022-12-20

A learned system uses machine learning (ML) internally to improve performance. We can expect such systems to be vulnerable to some adversarial-ML attacks. Often, the learned component is shared between mutually-distrusting users or processes, much like microarchitectural resources such as caches, potentially giving rise to highly-realistic attacker models. However, compared to attacks on other ML-based systems, attackers face a level of indirection as they cannot interact directly with the learned model. Additionally, the difference between the attack surface of learned and non-learned versions of the same system is often subtle. These factors obfuscate the de-facto risks that the incorporation of ML carries. We analyze the root causes of potentially-increased attack surface in learned systems and develop a framework for identifying vulnerabilities that stem from the use of ML. We apply our framework to a broad set of learned systems under active development. To empirically validate the many vulnerabilities surfaced by our framework, we choose 3 of them and implement and evaluate exploits against prominent learned-system instances. We show that the use of ML caused leakage of past queries in a database, enabled a poisoning attack that causes exponential memory blowup in an index structure and crashes it in seconds, and enabled index users to snoop on each others' key distributions by timing queries over their own keys. We find that adversarial ML is a universal threat against learned systems, point to open research gaps in our understanding of learned-systems security, and conclude by discussing mitigations, while noting that data leakage is inherent in systems whose learned component is shared between multiple parties.

translated by 谷歌翻译

Cutting Plane Selection with Analytic Centers and Multiregression

Mark Turner , Timo Berthold , Mathieu Besançon , Thorsten Koch

分类：机器学习

2022-12-14

Cutting planes are a crucial component of state-of-the-art mixed-integer programming solvers, with the choice of which subset of cuts to add being vital for solver performance. We propose new distance-based measures to qualify the value of a cut by quantifying the extent to which it separates relevant parts of the relaxed feasible set. For this purpose, we use the analytic centers of the relaxation polytope or of its optimal face, as well as alternative optimal solutions of the linear programming relaxation. We assess the impact of the choice of distance measure on root node performance and throughout the whole branch-and-bound tree, comparing our measures against those prevalent in the literature. Finally, by a multi-output regression, we predict the relative performance of each measure, using static features readily available before the separation process. Our results indicate that analytic center-based methods help to significantly reduce the number of branch-and-bound nodes needed to explore the search space and that our multiregression approach can further improve on any individual method.

translated by 谷歌翻译

Semantic-Aware Environment Perception for Mobile Human-Robot Interaction

Thorsten Hempel , Marc-André Fiedler , Aly Khalifa , Ayoub Al-Hamadi , Laslo Dinges

分类：机器人 | 人工智能 | 计算机视觉

2022-11-07

Current technological advances open up new opportunities for bringing human-machine interaction to a new level of human-centered cooperation. In this context, a key issue is the semantic understanding of the environment in order to enable mobile robots more complex interactions and a facilitated communication with humans. Prerequisites are the vision-based registration of semantic objects and humans, where the latter are further analyzed for potential interaction partners. Despite significant research achievements, the reliable and fast registration of semantic information still remains a challenging task for mobile robots in real-world scenarios. In this paper, we present a vision-based system for mobile assistive robots to enable a semantic-aware environment perception without additional a-priori knowledge. We deploy our system on a mobile humanoid robot that enables us to test our methods in real-world applications.

translated by 谷歌翻译

Avast-CTU Public CAPE Dataset

Branislav Bosansky , Dominik Kouba , Ondrej Manhal , Thorsten Sick , Viliam Lisy , Jakub Kroustek , Petr Somol

分类：人工智能 | 机器学习

2022-09-06

有限的公开数据可以支持恶意软件分析技术的研究。特别是，几乎没有由杜鹃/斗篷等丰富的沙盒生成的公开可用数据集。使用动态沙箱的好处是对目标机中文件执行的逼真模拟并获得该执行日志。机器可以被恶意软件感染，因此很有可能在执行日志中捕获恶意行为，从而使研究人员可以详细研究这种行为。尽管随后对日志信息的分析在工业网络安全后端被广泛介绍，但据我们所知，仅在学术界投入了有限的努力，以使用最先进的技术提高此类日志分析功能。我们使此示例数据集可用来支持设计新的机器学习方法以进行恶意软件检测，尤其是用于自动检测通用恶意行为。该数据集是在Avast软件和捷克技术大学-AI中心（AIC）之间合作的。

translated by 谷歌翻译

Augmented cross-selling through explainable AI -- a case from energy retailing

Felix Haag , Konstantin Hopf , Pedro Menelau Vasconcelos , Thorsten Staake

分类：机器学习 | 人工智能

2022-08-24

机器学习的进步（ML）引起了人们对这项技术支持决策的浓厚兴趣。尽管复杂的ML模型提供的预测通常比传统工具的预测更准确，但这种模型通常隐藏了用户预测背后的推理，这可能导致采用和缺乏洞察力。在这种张力的激励下，研究提出了可解释的人工智能（XAI）技术，这些技术发现了ML发现的模式。尽管ML和XAI都有很高的希望，但几乎没有经验证据表明传统企业的好处。为此，我们分析了220,185家能源零售商的客户的数据，预测具有多达86％正确性的交叉购买（AUC），并表明XAI方法的Shap提供了为实际买家提供的解释。我们进一步概述了信息系统，XAI和关系营销中的研究的影响。

translated by 谷歌翻译

Behavior Trees and State Machines in Robotics Applications

Razan Ghzouli , Swaib Dragule , Thorsten Berger , Einar Broch Johnsen , Andrzej Wasowski

分类：机器人

2022-08-08

自主机器人结合了各种技能，形成越来越复杂的行为，称为任务。尽管这些技能通常以相对较低的抽象级别进行编程，但它们的协调是建筑分离的，并且经常以高级语言或框架表达。几十年来，州机器一直是首选的语言，但是最近，行为树的语言在机器人主义者中引起了人们的关注。行为树最初是为计算机游戏设计的，用于建模自主参与者，提供了基于树木的可扩展的使命表示，并受到支持支持模块化设计和代码的重复使用。但是，尽管使用了该语言的几种实现，但对现实世界中的用法和范围知之甚少。行为树提供的概念与传统语言（例如州机器）有何关系？应用程序中如何使用行为树和状态机概念？我们介绍了对行为树中关键语言概念的研究及其在现实世界机器人应用中的使用。我们识别行为树语言，并将其语义与机器人技术中最著名的行为建模语言进行比较。我们为使用这些语言的机器人应用程序挖掘开源存储库并分析此用法。我们发现两种行为建模语言在语言设计及其在开源项目中的用法之间的相似性方面，以满足机器人域的需求。我们为现实世界行为模型的数据集提供了贡献，希望激发社区使用和进一步开发这种语言，相关的工具和分析技术。

translated by 谷歌翻译

FourCastNet: Accelerating Global High-Resolution Weather Forecasting using Adaptive Fourier Neural Operators

Thorsten Kurth , Shashank Subramanian , Peter Harrington , Jaideep Pathak , Morteza Mardani , David Hall , Andrea Miele , Karthik Kashinath , Animashree Anandkumar

分类：人工智能 | 计算机视觉 | 机器学习

2022-08-08

气候变化所扩大的极端天气正在造成全球日益毁灭性的影响。由于高计算成本和严格的时间到解决方案限制，目前基于物理的数值天气预测（NWP）的使用限制了精度。我们报告说，数据驱动的深度学习地球系统模拟器Fourcastnet可以预测全球天气，并在接近最先进的准确性的同时，比NWP更快地产生五个量子的预测。四个超级计算系统（Selene，Perlmutter和Juwels Booster高达3,808 nvidia a100 GPU）在三个超级计算系统上进行了优化，并有效地缩放，并在混合精度中获得140.8 PETAFLOPS（该规模的峰值为11.9％）。在3,072GPU上在Juwels Booster上测量的训练四界的时间到达的时间为67.4分钟，相对于最新的NWP，在推理中，相对于最先进的NWP的时间更快。 Fourcastnet提前一周可产生准确的瞬时天气预测，使巨大的合奏更好地捕捉了极端天气，并支持更高的全球预测决议。

translated by 谷歌翻译